Skip to content

[refine](column) enforce nullable nested types for array and map#62491

Open
Mryange wants to merge 1 commit intoapache:masterfrom
Mryange:enforce-nullable-nested-types-for-array-map
Open

[refine](column) enforce nullable nested types for array and map#62491
Mryange wants to merge 1 commit intoapache:masterfrom
Mryange:enforce-nullable-nested-types-for-array-map

Conversation

@Mryange
Copy link
Copy Markdown
Contributor

@Mryange Mryange commented Apr 14, 2026

What problem does this PR solve?

Problem Summary:

This PR makes the nested types inside Array and Map explicitly nullable in BE type implementations, instead of relying on implicit caller-side conventions.

  • DataTypeArray now always stores nullable nested element type
  • DataTypeMap now always stores nullable key/value types
  • DataTypeArraySerDe and DataTypeMapSerDe are updated to follow the same invariant

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@Thearas
Copy link
Copy Markdown
Contributor

Thearas commented Apr 14, 2026

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@Mryange
Copy link
Copy Markdown
Contributor Author

Mryange commented Apr 14, 2026

run buildall

@Mryange
Copy link
Copy Markdown
Contributor Author

Mryange commented Apr 14, 2026

/review

Copy link
Copy Markdown
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found one blocking issue.

  1. be/src/core/data_type/data_type_array.cpp: forcing every array child through make_nullable() changes the FE-visible type metadata that BE exports. DataTypeArray::to_protobuf() now always reports contains_null=true, and FE's FoldConstantRuleOnBE.convertToNereidsType() reconstructs Nereids ArrayType from that flag. That means BE constant folding can no longer round-trip ARRAY<... NOT NULL> precisely. A concrete case is CAST([1] AS ARRAY<INT NOT NULL>): after folding, FE will see ARRAY<INT> instead of the original non-null-child type. FE still treats ArrayType.containsNull as semantically significant in exact matching (fe/fe-type/.../ArrayType.java, Type.matchExactType()), so this is a real behavior regression, not just an internal refactor.

Critical checkpoint conclusions:

  • Goal of the task: standardize BE array/map nested types as nullable. The code mostly does that, but it does not preserve the existing FE-visible array child nullability contract on the BE->FE constant-folding path, so the goal is not achieved safely end-to-end.
  • Minimality/focus: the patch is small and focused, but it misses one required follow-through path (PTypeDesc export / FE reconstruction).
  • Concurrency: no concurrency concerns in the touched code.
  • Lifecycle/static init: no special lifecycle concerns.
  • Config changes: none.
  • Compatibility: this changes transmitted complex-type metadata behavior for arrays and is not compatibility-safe as written.
  • Parallel code paths: I checked the related SerDe/cast/type-factory paths. The array protobuf path is the one that remains inconsistent with FE expectations.
  • Special condition checks: the new DORIS_CHECK assertions are reasonable for invariant enforcement.
  • Test coverage: insufficient. There is no regression test covering FE<->BE round-trip of ARRAY<... NOT NULL> through BE constant folding or protobuf type export.
  • Test result changes: none.
  • Observability: not needed for this refactor.
  • Transaction/persistence/data-write concerns: none.
  • FE/BE variable passing: no new fields, but existing transmitted type metadata is affected and not preserved correctly.
  • Performance: no material issue identified.
  • Other issues: none beyond the blocking metadata regression above.

Because of the array type-metadata regression, I am requesting changes rather than approving.


DataTypeArray::DataTypeArray(const DataTypePtr& nested_) : nested {nested_} {}
DataTypeArray::DataTypeArray(const DataTypePtr& nested_) {
DataTypePtr nullable_nested = make_nullable(nested_);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrapping every array child in make_nullable() changes the type metadata you export later. DataTypeArray::to_protobuf() uses nested->is_nullable(), and FE's FoldConstantRuleOnBE.convertToNereidsType() reconstructs ArrayType from that flag. After this change, a folded expression like CAST([1] AS ARRAY<INT NOT NULL>) will round-trip back to FE as ARRAY<INT> because BE now always reports contains_null=true. FE still treats ArrayType.containsNull as part of exact type matching, so this is a real regression in the constant-folding path, not just an internal invariant cleanup. Please preserve the original child-nullability flag on the exported type metadata path or add a compatibility fix/test for it.

@Mryange
Copy link
Copy Markdown
Contributor Author

Mryange commented Apr 14, 2026

run buildall

@Mryange
Copy link
Copy Markdown
Contributor Author

Mryange commented Apr 15, 2026

/review

Copy link
Copy Markdown
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocking findings:

  1. fe/fe-type/src/main/java/org/apache/doris/catalog/ArrayType.java:80
    This PR no longer preserves previously stored containsNull values in FE catalog types. After replay/deserialization, every runtime consumer now observes true (getContainsNull(), matchesType(), toSql(), toThrift()), so older metadata or external schemas that carried containsNull=false silently change shape when FE reuses or re-serializes them.

  2. fe/fe-type/src/main/java/org/apache/doris/catalog/MapType.java:98 and :105
    MapType now flattens both nested nullability bits to true as well. That removes information still consumed by downstream FE integrations such as Iceberg/Paimon schema conversion and FE->BE thrift export, so required nested fields can no longer be distinguished from optional ones.

Critical checkpoint conclusions:

  • Goal of current task: Make BE/FE complex nested types consistently nullable. The BE-side invariant change looks aligned with that goal, but the FE catalog changes go further and break compatibility for previously stored nested-nullability metadata. No test demonstrates replay/round-trip safety.
  • Change scope: Not as small/focused as it should be. Besides internal type normalization, it changes FE catalog equality, SQL rendering, thrift serialization, and external integration behavior.
  • Concurrency: No meaningful concurrency or lock-order issues in the touched paths.
  • Lifecycle/static initialization: No special lifecycle or static-init issues found.
  • Configuration: No config changes.
  • Compatibility: Blocking issue. Persisted/catalog/thrift type semantics change without compatibility handling or version gating.
  • Parallel code paths: External catalog conversion paths (notably Iceberg/Paimon) still depend on these FE nullability bits and are affected.
  • Conditional logic/comments: New comments explain the intended invariant, but they do not justify removing compatibility-preserving behavior.
  • Test coverage: Insufficient for a compatibility-sensitive type-system change. Only local unit expectations were updated; no replay, thrift round-trip, or external catalog coverage was added.
  • Modified test results: Updated unit assertions are internally consistent, but they do not prove correctness for persisted metadata or downstream integrations.
  • Observability: No extra observability required.
  • Transaction/persistence: Persistence/serialization semantics are involved on the FE side and are not preserved.
  • Data writes/modifications: I did not find a direct storage-write regression in the touched code.
  • FE/BE variable passing: Existing type metadata sent through thrift changes meaning on the FE side without compatibility handling.
  • Performance: No blocking performance concern found.
  • Other issues: None beyond the two compatibility regressions above.

Requesting changes until the FE catalog types either preserve stored nested-nullability bits, or the PR adds a complete/versioned compatibility migration plus tests that cover replay and downstream integrations.

* Always returns {@code true}. Array elements are always nullable in Doris.
*/
public boolean getContainsNull() {
return containsNull;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This breaks replay/round-trip compatibility for existing FE metadata that was serialized with containsNull=false. The field is still retained for deserialization, but after this change every runtime consumer sees true, so replayed types silently change shape and FE can no longer faithfully re-emit the original schema through matchesType(), toSql(), or thrift. If the goal is only to normalize BE internals, the FE catalog type still needs to preserve the stored bit or introduce an explicit compatibility migration.

TTypeNode node = new TTypeNode();
container.types.add(node);
Preconditions.checkNotNull(keyType);
Preconditions.checkNotNull(valueType);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This forces both map nested-nullability bits to true on every FE->BE thrift serialization. That is a compatibility regression for existing FE metadata / external schemas that still carry isKeyContainsNull=false or isValueContainsNull=false, and it also removes information still consumed by downstream FE integrations such as DorisTypeToIcebergType and DorisToPaimonTypeVisitor. Please keep the stored bits intact unless a versioned migration updates every persisted schema and its readers together.

@Mryange Mryange force-pushed the enforce-nullable-nested-types-for-array-map branch from c589160 to 453b7d4 Compare April 15, 2026 05:59
@Mryange
Copy link
Copy Markdown
Contributor Author

Mryange commented Apr 15, 2026

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

FE UT Coverage Report

Increment line coverage 73.68% (14/19) 🎉
Increment coverage report
Complete coverage report

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants